Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets.

نویسندگان

  • Hua Cheng
  • Bong-Hyun Kim
  • Nick V Grishin
چکیده

A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MALISAM: a database of structurally analogous motifs in proteins

MALISAM (manual alignments for structurally analogous motifs) represents the first database containing pairs of structural analogs and their alignments. To find reliable analogs, we developed an approach based on three ideas. First, an insertion together with a part of the evolutionary core of one domain family (a hybrid motif) is analogous to a similar motif contained within the core of anothe...

متن کامل

HorA web server to infer homology between proteins using sequence and structural similarity

The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect struct...

متن کامل

Rnav: Non-coding Rna Secondary Structure Variation Search via Graph Homomorphism

Non-coding RNA (ncRNA) secondary structural homologs can be detected effectively in genomes with profile-based search methods. However, due to the lack of appropriate ncRNA structural evolution models, it is difficult to accurately detect distant structural homologs, i.e., ncRNA structures with variations caused by evolutionary changes such as the insertion or deletion of a substantial portion ...

متن کامل

Contribution of electrostatic interactions, compactness and quaternary structure to protein thermostability: lessons from structural genomics of Thermotoga maritima.

Studies of the structural basis of protein thermostability have produced a confusing picture. Small sets of proteins have been analyzed from a variety of thermophilic species, suggesting different structural features as responsible for protein thermostability. Taking advantage of the recent advances in structural genomics, we have compiled a relatively large protein structure dataset, which was...

متن کامل

Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison.

A detailed analysis of the performance of hybrid, a new sequence alignment algorithm developed by Yu and coworkers that combines Smith Waterman local dynamic programming with a local version of the maximum-likelihood approach, was made to access the applicability of this algorithm to the detection of distant homologs by sequence comparison. We analyzed the statistics of hybrid with a set of non...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of molecular biology

دوره 377 4  شماره 

صفحات  -

تاریخ انتشار 2008